AITopics | data engineer

Collaborating Authors

data engineer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Redefining data engineering in the age of AI

MIT Technology ReviewOct-23-2025, 13:00:00 GMT

As AI becomes central to the enterprise, data engineers are stepping out from behind the scenes to help shape AI strategy and influence business decisions. As organizations weave AI into more of their operations, senior executives are realizing data engineers hold a central role in bringing these initiatives to life. After all, AI only delivers when you have large amounts of reliable and well-managed, high-quality data. Indeed, this report finds that data engineers play a pivotal role in their organizations as enablers of AI. And in so doing, they are integral to the overall success of the business. According to the results of a survey of 400 senior data and technology executives, conducted by MIT Technology Review Insights, data engineers have become influential in areas that extend well beyond their traditional remit as pipeline managers.

data engineer, redefining data, share story, (7 more...)

MIT Technology Review

Country:

Asia > India (0.06)
North America > United States > Massachusetts (0.05)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)

Add feedback

M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging

Feng, Jinghao, Zheng, Qiaoyu, Wu, Chaoyi, Zhao, Ziheng, Zhang, Ya, Wang, Yanfeng, Xie, Weidi

arXiv.org Artificial IntelligenceFeb-27-2025

Agentic AI systems have gained significant attention for their ability to autonomously perform complex tasks. However, their reliance on well-prepared tools limits their applicability in the medical domain, which requires to train specialized models. In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging. At its core, M3Builder employs four specialized agents that collaborate to tackle complex, multi-step medical ML workflows, from automated data processing and environment configuration to self-contained auto debugging and model training. These agents operate within a medical imaging ML workspace, a structured environment designed to provide agents with free-text descriptions of datasets, training codes, and interaction tools, enabling seamless communication and task execution. (ii) To evaluate progress in automated medical imaging ML, we propose M3Bench, a benchmark comprising four general tasks on 14 training datasets, across five anatomies and three imaging modalities, covering both 2D and 3D data. (iii) We experiment with seven state-of-the-art large language models serving as agent cores for our system, such as Claude series, GPT-4o, and DeepSeek-V3. Compared to existing ML agentic designs, M3Builder shows superior performance on completing ML tasks in medical imaging, achieving a 94.29% success rate using Claude-3.7-Sonnet as the agent core, showing huge potential towards fully automated machine learning in medical imaging.

dataloader, dataset, json, (15 more...)

arXiv.org Artificial Intelligence

2502.20301

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Role of Accuracy and Validation Effectiveness in Conversational Business Analytics

Alparslan, Adem

arXiv.org Artificial IntelligenceNov-25-2024

This study examines conversational business analytics, an approach that utilizes AI to address the technical competency gaps that hinder end users from effectively using traditional self-service analytics. By facilitating natural language interactions, conversational business analytics aims to empower end users to independently retrieve data and generate insights. The analysis focuses on Text-to-SQL as a representative technology for translating natural language requests into SQL statements. Developing theoretical models grounded in expected utility theory, this study identifies the conditions under which conversational business analytics, through partial or full support, can outperform delegation to human experts. The results indicate that partial support, focusing solely on information generation by AI, is viable when the accuracy of AI-generated SQL queries leads to a profit that surpasses the performance of a human expert. In contrast, full support includes not only information generation but also validation through explanations provided by the AI, and requires sufficiently high validation effectiveness to be reliable. However, user-based validation presents challenges, such as misjudgment and rejection of valid SQL queries, which may limit the effectiveness of conversational business analytics. These challenges underscore the need for robust validation mechanisms, including improved user support, automated processes, and methods for assessing quality independent of the technical competency of end users.

effectiveness, query, sql query, (15 more...)

arXiv.org Artificial Intelligence

2411.12128

Country:

North America > United States (0.46)
Europe > Switzerland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.66)

Industry:

Government (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data

Liu, Haoyang, Li, Yijiang, Jian, Jinglin, Cheng, Yuxuan, Lu, Jianrong, Guo, Shuyi, Zhu, Jinglei, Zhang, Mianchen, Zhang, Miantong, Wang, Haohan

arXiv.org Artificial IntelligenceFeb-20-2024

Machine learning has emerged as a powerful tool for scientific discovery, enabling researchers to extract meaningful insights from complex datasets. For instance, it has facilitated the identification of disease-predictive genes from gene expression data, significantly advancing healthcare. However, the traditional process for analyzing such datasets demands substantial human effort and expertise for the data selection, processing, and analysis. To address this challenge, we introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline. TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM). These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes. Furthermore, we have curated a benchmark dataset to assess TAIS's effectiveness in gene identification, demonstrating our system's potential to significantly enhance the efficiency and scope of scientific exploration. Our findings represent a solid step towards automating scientific discovery through large language models.

arxiv preprint arxiv, cancer, data engineer, (13 more...)

arXiv.org Artificial Intelligence

2402.12391

Country:

North America > United States > Illinois (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)
Europe > Netherlands (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)

Moin, Armin, Badii, Atta, Günnemann, Stephan, Challenger, Moharram

arXiv.org Artificial IntelligenceAug-9-2023

Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.05239

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Add feedback

Data Engineer - Managed service at Version 1 - Edinburgh, United Kingdom

#artificialintelligenceApr-16-2023, 00:05:42 GMT

We pledge "to prove IT can make a real difference to our customer's businesses". We work hard to ensure we understand what our customers need from their technology solutions and then we deliver. We are an award-winning company who provide world class customer service; we think big and we hire great people. Version 1 are more than just another IT services company - we are leaders in implementing and supporting Oracle, Microsoft and AWS technologies. Version 1 is looking for an experienced data engineer join its Managed Services team.

data engineer, united kingdom, version 1, (1 more...)

#artificialintelligence

Country: Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.40)

Industry: Information Technology > Services (0.64)

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Data Engineer at DAZN - Hyderabad, India

#artificialintelligenceApr-15-2023, 03:12:17 GMT

Are you an engineer who loves to make things that just work better? Do you love to work with cutting edge technologies and think about how can this run faster, be deployed quicker or fail less and deliver killer streaming applications that add business value and stick with customers? DAZN is a tech-first sport streaming platform that reaches millions of users every week. We are challenging a traditional industry and giving power back to the fans. Our new Hyderabad tech hub will be the engine that drives us forward to the future.

data engineer, dazn, kinesis, (9 more...)

#artificialintelligence

Country: Asia > India > Telangana > Hyderabad (0.40)

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Sr. Data Engineer - Cloud - Remote (Gurgaon) at Aryng - Gurugram, Haryana, India - Remote

#artificialintelligenceApr-15-2023, 03:12:11 GMT

Aryng is hiring for Full Time Sr. Data Engineer - Cloud - Remote (Gurgaon) - Gurugram, Haryana, India - Remote - a Mid-level AI, ML, Data Science role offering benefits such as Competitive pay, Flat hierarchy, Flex hours

aryng, data engineer, remote, (7 more...)

#artificialintelligence

Country: Asia > India > Haryana (0.60)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence (0.69)
Information Technology > Architecture > Real Time Systems (0.48)

Add feedback

Sr. Data Engineer - ASM Analytics at Visa - Bengaluru, India

#artificialintelligenceApr-13-2023, 03:10:41 GMT

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive. When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.

analytic, data engineer, visa, (7 more...)

#artificialintelligence

Country: Asia > India > Karnataka > Bengaluru (0.40)

Industry:

Banking & Finance (0.93)
Information Technology (0.57)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining (0.81)

Add feedback

Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow

#artificialintelligenceApr-12-2023, 07:15:19 GMT

When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.

data scientist, engineer, python, (13 more...)

#artificialintelligence

Industry: Information Technology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback